Estimating GRACE terrestrial water storage anomaly using an improved point mass solution

The availability of terrestrial water storage anomaly (TWSA) data from the Gravity Recovery and Climate Experiment (GRACE) supports many hydrological applications. Five TWSA products are operational and publicly available, including three based on mass concentration (mascon) solutions and two based on the synthesis of spherical harmonic coefficients (SHCs). The mascon solutions have advantages regarding the synthesis of SHCs since the basis functions are represented locally rather than globally, which allows geophysical data constraints. Alternative new solutions based on SHCs are, therefore, critical and warranted to enrich the portfolio of user-friendly TWSA data based on different algorithms. TWSA data based on novel processing protocols is presented with a spatial re-sampling of 0.25 arc-degrees covering 2002–2022. This approach parameterizes the improved point mass (IPM) and adopts the synthesized residual gravitational potential as observations. The assay indicates that the proposed Hohai University (HHU-) IPM TWSA data reliably agree with the mascon solutions. The presented HHU-IPM TWSA data set would be instrumental in regional hydrological applications, particularly enabling improved assessment of regional water budgets.

The methodology proposed to estimate TWSA here is based on the inversion of the synthesized residual gravitational potential at GRACE satellites' altitude parameterized by the improved point mass (IPM) 27 . The advantage of such a solution is the possibility of applying spatial or spatiotemporal constraints, which otherwise would be complex based on the synthesis of SHCs 29 . Since most users would be unable to generate the residual gravitational potential based on Level-1b data, unfiltered SHCs can be used for this purpose [30][31][32][33][34][35] . Hence, the observations consist of the synthesized residual gravitational potential at GRACE's altitude using the SHCs provided by COST-G. The choice for the COST-G solution relies on the advantage of using the statistical properties of different solutions, resulting in a reduced noise level compared to the individual solutions 16 . The generated TWSA data comprise the monthly grids at a spatial resolution of 0.25 arc-degree covering April 2002 to May 2022 36 . The new Level-3 product generated at Hohai University (HHU), namely HHU-IPM 36 , was compared with mascon solutions provided by the CSR (CSR-M 9 ), GSFC (GSFC-M 13 ), and JPL (JPL-M 10 ). Overall, the results of the intercomparisons show that the estimated TWSA data present a root-mean-square error (RMSE) at the same level as those of mascon solutions.
Currently, only CSR-M, GSFC-M, and JPL-M are updated regularly and available for users. These solutions are based on different regularizations, parametrizations, and constraint schemes. Consequently, intercomparisons of mascon solutions in terms of long-term trends in TWSA have shown that those for JPL-M are higher than CSR-M, up to 30% 37 . Conversely, the long-term trends based on synthesized TWSA are generally 15% lower than those based on CSR mascon 37 . Consequently, applying different TWSA products based on mascon solutions and SHCs would lead to different results and conclusions. Hence, increasing the portfolio of TWSA products based on different inversion schemes would benefit the end users. This is because the intercomparisons and overall variability among the different data sets may indicate the uncertainties of the respective TWSA products. The present study's primary goal is to present an HHU-IPM data set containing TWSA monthly grids from April 2002 to May 2022 36 . The HHU-IPM data set has a potential reuse value among hydrologists. For instance, the proposed HHU-IPM data set 36 was used to optimize the parameters of hydrologic model 38 , which significantly supports hydrological modeling and forecasting.

Methods
Input data sets. All data sets used in this study as inputs (Table 1)  The GRACE/GRACE-FO COST-G 16 (http://cost-g.org/products/) monthly solutions are given up to degree and order 90, and truncated here at degree 60 were used. COST-G is the International Gravity Field Service (IGFS) product center under the International Association of Geodesy (IAG) umbrella. The Level-2 products consisting of monthly SHCs are generated by COST-G while combining existing solutions from its analysis centers and partner analysis centers. The analysis centers consist of the Astronomical Institute of the University of Bern (AIUB), the Center National d'Etudes Spatiales (CNES), the GFZ, and the Institute of Geodesy of the Graz University of Technology (ITSG). The COST-G's partner analysis centers consist of the CSR and the JPL. The advantage of using the COST-G product compared with the individual ensemble members is that it effectively reduces the noise level 39 and improves signal quality 40 . The COST-G Level-2 product covers the period from April 2002 to May 2022. The geocentric gravitational constant GM E associated with the COST-G solution refers to the value of 3.986004415 × 10 14 m 3 s −2 and the Earth's equatorial radius a E of 6,378,136.3 m.
The GIA model ICE-6G-D 44 post-processed by Dahle & Murboeck 45 (https://dataservices.gfz-potsdam.de/ gravis/showshort.php?id=escidoc:4175889) was considered and used to remove the secular trends of the GIA process on GRACE measurements. Since the GIA is given as trends of SHCs per year, we used the middle point between January 2004 and December 2009 (i.e., December 31, 2006) for computing the equivalent monthly corrections.
Additionally, the aliased signal of the S2 tide by fitting a model with a constant, linear trend, annual, semi-annual, and 161-day components was removed from the monthly residual SHCs.
Residual gravitational potential and TWSA. The relationship between TWSA (symbolically given by Δr) and the residual gravitational potential (δV) is given by Newton's integral, for which one of the numerical solutions is the IPM solution as:¯V with a further explanation available in Ferreira et al. 27 . In Eq. (1), G is Newton's constant of gravitation, ρ w is the density of fresh water, ϕ Δ and Δλ denote the latitudinal and longitudinal dimensions of a grid cell i, and r, ϕ and λ are the geocentric distance, spherical latitude, and longitude, respectively, for the observed residual gravitational potential. The coefficients K i, j appearing in Eq. (1) are computed as ( at which they are evaluated at the geometric center of a grid cell defined by 0 ϕ and λ 0 with ψ being the spherical distance between the computation point (r, , ϕ λ) and the running point ( ϕ λ R, , 0 0 ). In Eq. (1), the fourth-order and higher terms are omitted as indicated by O(Δ 4 ).
Here, the Earth's surface is represented by an ellipsoid with a semi-major axis a equals to 6,378,137 m, and the first eccentricity squared e 2 equals to 6.69437999014 × 10 −3 . The ellipsoidal surface was discretized into cells with dimensions Δϕ = 1° and Δλ = 1°. Noteworthy, Eqs. (1, 2) require spherical latitudes (ϕ and ϕ 0 ), whereas the curvilinear coordinates for an ellipsoidal surface use the geodetic latitude (ϕ). Hence, the ellipsoidal coordinates (ϕ, λ, h) were transformed into spherical coordinates (r, , ϕ λ) using an intermediate transformation to Cartesian coordinates. This means that the geocentric distance r and the mean Earth's radius R are now latitude-dependent distances. So, R is now the radial distance of the points at the ellipsoidal surface, which depends on the latitude and can be used to locally approximate the ellipsoid by a sphere. This local radius is the radius of the approximating sphere.
The residual gravitational potential δV can be computed based on Level-1b or Level-2 data sets retrieved from GRACE's measurements. Due to computational limitations, in this work, the residual gravitational potential was synthesized from the unfiltered Level-2 products (i.e., SHCs). Given the residual SHCs ΔC nm and S nm Δ as mentioned above, the residual gravitational potential can be synthesized using the following expression (cf. Wahr et al. 47 , GM E and a E refers to the geocentric gravitational constant (3.986004415 × 10 14 m 3 s −2 ) and the Earth's equatorial radius (6,378,136.3 m) for the COST-G geopotential model, n and m are the degrees and orders, respectively, P nm are the associated Legendre functions, and the degree-dependent load Love numbers k n accounting for the instantaneous reaction of the solid Earth to change surface loads. This solution was formed by a set of monthly residual SHCs complete to a maximum degree (n max ) equals to 60, and no filtering was applied. The residual SHCs were computed by subtracting the mean value regarding a temporal baseline from January 2004 to December 2009 from the respective monthly values. As in Eq. (1), Eq. (3) also requires spherical coordinates, which are computed from the ellipsoidal ones using Cartesian coordinates.
Parameter estimation using a priori constraint. Equation (1) represents the functional mode for estimating TWSA (Δr) given the residual gravitational potential δV as the observations. Consequently, the linear system can be expressed as 48 : where d is the n × 1 vector of the residual gravitational potentials (i.e., the "observations"), m is the m × 1 vector of the unknown TWSA (Δr), G is the n × m coefficient matrix, a matrix with the partial derivatives with respect to the model parameters Δr, the unknowns, and it is given as: In Eq. (4), e is the n × 1 residual vector of the observations with zero expectation and variance d 2 σ , and I is the identity cofactor matrix. However, estimating TWSA at the Earth's surface, given the residual gravitational potential at GRACE's orbit as observations, is an ill-posed problem. This is because there is no unique solution (an infinite number of TWSA distributions can generate the same residual gravitational potential field). This inverse problem is a typical downward continuation problem in which its computation is unstable, and a small perturbation in the observations amplifies during the continuation. Furthermore, the matrix G is poorly conditioned, which requires special treatment in the following derivations.
can be given as 49 where α is a regularization parameter (or trade-off parameter). Both the regularization parameter α and the error variance d 2 σ are unknown in Eq. (8) and can be estimated as follows 49 : Of course, the solutions of Eqs. (8,9) require an iterative scheme in which initial values for α and σ d 2 should be provided at Eq. (8) to estimate m.
Without better knowledge about the covariance matrix C m , its elements can be computed using the analytical covariance function given as assuming that m is a weakly stationary stochastic process on the sphere. In Eq. (10), ψ is the the spherical distances, taken pairwise, between the elements i and j, σ m 2 is the variance, and D is the correlation distance. The analytical covariance function (10) can be used to estimate the unknowns σ m and D taking the values of the empirical covariance function Again, once the empirical covariances C m are determined for the respective distances kΔψ, they can be used in Eq. (10) to estimate σ m and D. These parameters are estimated by solving the nonlinear least squares problem, which minimizes the sum of the squares of the residues.
The a priori information m 0 used here was the synthesized TWSA fields from the filtered SHCs (i.e., the filtered COST-G). Given the filtered residual SHCs ΔC nm f and S nm f Δ complete to degree 60, the synthesized TWSA is given as In Eq. (12), ρ E stands for the mean density of the Earth. The SHCs were de-correlated using a polynomial 50 and smoothed using the 400-km Gaussian filter 47 providing C nm and S nm f Δ necessary in Eq. (12). Noteworthy, the synthesized TWSA from GRACE SHCs can provide the best a priori for the parameters since it does not rely on modeling hypotheses. Furthermore, the results from Eq. (12) do not fully reconstruct the values from Eq. (3) due to the post-processing of the SHCs used on the former. Hence, it is assumed that the observations d in Eq. (4) and the a priori information m 0 in Eq. (6) as being independent.
Computational procedure. A conceptual flowchart presenting the overall computational steps is shown in Fig. 1.
The first step is to obtain the GRACE Level-2 product, here represented by the COST-G solution. The GIA effects are accounted for, the low-degree coefficients replaced, the long-term mean (2004-2009) subtracted from the respective monthly solutions, and the aliased signal of the S2 tide removed. The residual gravitational potential grids for the respective months are then synthesized at 500 km altitude at geographical bins separated by 3 arc-degree using Eq. (3). No filtering is applied to synthesize the residual gravitational potential. However, a pre-processing step for the residual SHCs at the respective epochs is necessary for synthesizing the TWSA (2023) 10:234 | https://doi.org/10.1038/s41597-023-02122-1 www.nature.com/scientificdata www.nature.com/scientificdata/ grids as indicated in Eq. (12). The respective monthly TWSA grids with the oceans and the glaciated regions masked out are used as a priori information m 0 in Eqs. (8,9). These results in 14,438 parameters covering the lands. The gravitational potentials due to the 50,362 cells covering the oceans and glaciated regions are forwarded using Eq. (1) and then subtracted from the synthesized potentials to form the observations d necessary in Eqs. (8,9). The observation vector d contains 5,400 elements.
In the next step, the correspondingly monthly empirical covariance functions as per Eq. (11) are computed using the a priori information m 0 as an initial guess (i.e., m (0) = m 0 ). This is followed by estimating the respective σ m 2 (the variance) and D (the correlation distance) by solving the nonlinear least squares (nonlinear data-fitting) problem. This can be conducted using the trust-region reflective algorithm 51 . Subsequently, the respective covariance matrices C m , as per Eq.
() is less than a given tolerance δ. The tolerance value adopted in this study was 1 × 10 −6 mm, which converged with six to ten iterations. The converged solution provides the product IPM containing the TWSA fields at 1 arc-degree resolution.
The final product IPM containing the TWSA fields is partitioned from 1 arc-degree cell to sub-cells with 0.25 arc-degrees to describe better the coastlines (or any boundaries, e.g., a river basin) and for comparison purposes against the mascon solutions. The re-sampling considers the volume conservation in the form of synthesize the residual potential at 500 km on 3°×3° grid (Eq. 3) low-degree coefficients (degrees one and two) COST-G spherical harmonic coefficients Glacial Isostatic Adjustment (GIA) spherical harmonic coefficients replace the low-degree coefficients, correct for the S2 tide and GIA, and remove the long-term mean compute the empirical covariances (Eq. 11), grouped in an interval of the distance fit the analytical covariance function (Eq. 10) to the empirical covariances using a trust-region reflective model in which Δr is the TWSA and A is the area of the original cell (e.g., 1-by-1 arc-degree), and δr i and a i are the respective TWSA and area values for the sub-cells j (the smaller cells with 0.25-by-0.25 arc-degrees) within the original cell. This can be solved through least squares at which the observation is the product Δr·A (one observation), and the parameters are the N values of δr (16 for a cell with 1 arc-degree partitioned into sub-cells with 0.25 arc-degrees). This simple linear system with one equation and 16 unknowns can be solved by minimizing the norm of the parameters since several solutions exist to this problem. Albeit the HHU-IPM's TWSA product is provided at 0.25-by-0.25 arc-degrees, the resolution of the data is still intrinsic to the nominal resolution of GRACE, which is 3-by-3 arc-degrees. This is compatible with the maximum degree of expansion used in Eq. (3), which the spatial (half) wavelength is given by π/n max . That is, Eq. (13) does not add high-frequency content (small-scale features) on the re-sampled TWSA. The re-sampling by Eq. (13) provides only spatial match-ups of the data sets for evaluation and regional averaging purposes.
MATLAB functions to perform the inversion (Fig. 1) are available for download 28 .

Data Records
The The TWSA estimations were corrected for a GIA model (Table 1) of secular trends. However, the impacts of geophysical signals due to co-seismic and/or post-seismic deformation from major offshore earthquakes were not considered. Therefore, water mass gain/loss analysis must be treated with caution in land areas near large earthquakes (e.

Technical Validation
The HHU-IPM 36 aims to provide users with an alternative solution for the existing GRACE TWSA data sets. However, an important question is whether the algorithm's outcome summarized in Fig. 1 is akin to those based on mascon solutions, which are widely used at global and basin scales. The following sub-sections present ancillary data sets and the plausibility assay based on the temporal decomposition of the time series for uncertainties assessment and long-term changes of the TWSA products. Albeit mascon data were used in the comparisons, they are not avouched here as ground truth.

Supporting data sets. GRACE solutions. The GRACE mascon solutions CSR-M 9 , GSFC-M 13 , and JPL-M 10
were used for internal comparison purposes. Their main characteristics are presented in Table 2.
The three mascon solutions, CSR-M, GSFC-M, and JPL-M, are given at a sampling of 0.25, 0.5, and 0.5 arc-degrees (Table 2). Nevertheless, their respective resolutions during the inversion are equivalent to an equal-area spherical cap corresponding to a 3 arc-degree (central angle, "diameter") at the equator for JPL-M, and an equal-area hexagonal tile for CSR-M and an equal-area quadrangle cell for GSFC-M equals to a 1 arc-degree cell at the equator (approx. 12,400 km 2 ). Noteworthy, the nominal resolution of GRACE is approximately 330 km, which is mainly constrained by the satellites' altitude (400-500 km). However, Vishwakarma et al. 54 pointed out that basins with an area of approximately 63,000 km 2 (approx. 250 × 250 km) could be resolved with an overall error of about 20 mm. Nevertheless, a lower resolution could be possible depending on the amplitude of the TWSA signal 55 and/or processing strategy 56 . The GSFC-M and JPL-M were re-sampled from 0.5 arc-degrees to 0.25 arc-degrees using Eq. (13), that is, conserving the volume of the original cells (0.5-by-0.5 www.nature.com/scientificdata www.nature.com/scientificdata/ arc-degrees). Likewise, the GravIS TWSA solutions were re-sampled from 1.0 arc-degree to 0.25 arc-degrees through Eq. (13). Noteworthy, the re-sampling does not improve (or deteriorate) the TWSA fields.
Selected river basins. The river basins' delineations were downloaded from the Global Runoff Data Centre (GRDC, https://www.bafg.de), and further details can be found in reference 57 . Only 120 river basins with areas equal to or larger than 100,000 km 2 were considered and divided into size classes. The size classes include large basins, those with areas equal to or larger than 1,000,000 km 2 (basins 001-022), medium-size basins, those with areas equal to or larger than 360,000 km 2 and less than 1,000,000 km 2 (basins 023-050), and small basins those equal to or larger than 100,000 km 2 to less than 360,000 km 2 (basins 051-120). Noteworthy, the definition of large, medium, and small catchments did not consider the hydrological processes of the individual basins, and thereby the classification is just used in the context of the present work and GRACE's resolution.
Uncertainties in GRACE TWSA. The direct technical validation of TWSA products is impossible due to the nonexistence of ground-based measurements. Early studies have proposed alternative ways to evaluate different GRACE solutions 37,[58][59][60] . In this study, the assessment suggested by Groh et al. 60 was undertaken to evaluate the HHU-IPM TWSA product. Firstly, the residuals computed as the differences between the original and the detrended and de-seasoned series were smoothed using a 13-month moving average. Secondly, the difference between original and smoothed residuals provided the high-pass filtered residuals. This assessment considers only the monthly solutions from January 2004 to December 2010. The choice for this particular period for the evaluation relies on the fact that there are no missing months. Finally, the high-pass filtered residuals were then used to assess the noise level of the HHU-IPM solution based on the RMSE. RMSE values of 0 (zero) indicate a perfect fit 61  Overall, the magnitude of the RMSE values over the continents presents a spatial pattern following the TWSA variability (Fig. 2)  Nevertheless, evaluation at a regional and or local scale could provide different conclusions. For example, over areas with minimal mass variations, such as the Sahara Desert, the root-mean-square (RMS) of the detrended TSWA series is 4.79 mm for JPL-M, which outperforms CSR-M, GravIS, GSFC-M, and HHU-IPM by 117.0%, 152.8%, 173.6%, and 162.3%, respectively. (The evaluation over the Sahara Desert considered TWSA data within the region between the longitudes 3°W to 9°E and latitudes 24°N to 28°N, equivalent to 522,094.5 km 2 .) For a portion of the Gobi Desert delineated by the longitudes 100°E to 105°E and latitudes 40°N to 45°N (227,992.3 km 2 ), the RMS of the TWSA series is 6.36 mm for JPL-M. This figure is equivalent to an improvement of 37.7%, 81.4%, 115.3%, and 209.0% of JPL-M regarding the CSR-M, GravIS, GSFC-M, and HHU-IPM, respectively. The performance of JPL-M using areas with minimal TWSA variations might be influenced by the geophysical model-based a priori variance approach used in the inversion 11 . Overall, all solutions Table 2. The data sets used for comparison purposes. The GRACE solutions used for comparisons are the mascon data sets at CSR, GSFC and JPL, and GravIS at GFZ based on the COST-G Level-2b data. The GRDC's major river basins of the world. (*) The parameters of the respective mascon products were estimated at compartments with that area values.
www.nature.com/scientificdata www.nature.com/scientificdata/ performed well as analyzed by the low variability in the TWSA series with a noise level of less than 20 mm and less than 25 mm based on the temporal decomposition.
River-basin scale evaluation. Basin-level time series analysis was carried out over the 120 largest river basins (those with areas equal to or larger than 100,000 km 2 ). Further details about the basins can be found in Table 2 and ref. 57 . The RMSE values of the high-pass filtered residuals were computed for the respective basins' time series. Additionally, bootstrapping 63 was implemented to estimate confidence and significance for the RMSE values for the respective basins and solutions using 1,000 re-samplings. Figure (Figs. 2, 3). However, 21 basins, indicated by the shaded rectangles in Fig. 3, present RMSE values equal to or larger than the standard deviation of the respective TWSA time series for the JPL-M solution. This characterizes relatively high RMSE values, meaning noises dominate the TWSA series over the respective basins. For the other solutions, this figure is 12, 13, 10, and 4 for CSR-M, GravIS, GSFC-M, and HHU-IPM (Fig. 3).
Similar values for the median of the RMSE values of the GRACE solutions are seen for the large basins (001-022) as depicted in Fig. 4. That is, GRACE solutions are more consistent among them for large basins. The area-weighted average for the 22 largest basins results in 25.87 mm for HHU-IPM (Fig. 3a). This shows the outperformance of approximately 1.5%, 4.2%, and 3. www.nature.com/scientificdata www.nature.com/scientificdata/ equals to 24.57 mm. Furthermore, the scatter of the RMSE values over small basins (100,000 -360,000 km 2 ) is relatively larger regarding the other scales (Fig. 4).
Be it grid-or basin-scale evaluation, the performance values summarized in Figs. 2, 3 only highlight that the five solutions (CSR-M, GravIS, GSFC-M, HHU-IPM, and JPL-M) agree well among them without systematic differences (Fig. 3). This also highlights that the high-pass filtered residuals of the five solutions are not affected by interannual signals and or signals with frequencies higher than semi-annual. (Or they are impacting the RMSE estimations in the same amount for the five solutions, which deserves further investigation.) Hence, the variability among the five solutions, as summarized in Figs. 2-4, may suggest that the observed uncertainties of GRACE TWSA lie within an acceptable range for hydrological applications 37 . Consequently, uncertainties in GRACE solutions based on SHCs (e.g., HHU-IPM and GravIS) appear to be similar to those based on mascon approaches in terms of noise levels (Figs. 2-4). For the sake of example, Fig. 5 shows the TWSA time series for four river basins from the largest one (Amazon Basin, Fig. 5a) to the smaller one (Severn Basin, Fig. 5d). www.nature.com/scientificdata www.nature.com/scientificdata/ Overall, there is a good agreement among the five solutions and, as expected, over a large basin located at humid climate zone, GRACE products provide almost the same solution (Fig. 5a, see also Fig. 3). Conversely, differences are seen over the Severn Basin (basin n. 120), the smallest basin analyzed here (Fig. 5d). For instance, the lowest and the highest amplitudes are seen for the GSFC-M and JPL-M solutions with the other series lying between them. However, slight differences are seen over larger basins located at the semiarid climate zones as Murray Basin (Fig. 5b) and the Shatt al-Arab Basin (Fig. 5c) (Fig. 3). Figure 5 also indicates that the respective GRACE solutions provide the general long-term trends over the Amazon Basin (Fig. 5a). However, the magnitude of the trends might differ among the various solutions over smaller basins across different climate zones (Fig. 5b-d).
Long-term trends in GRACE TWSA. The long-term trends in GRACE TWSA were estimated for all solutions from April 2002 to December 2021 (common period for all solutions) over the 120 basins. Figure 6 summarizes the long-term trends for all 120 basins.
Overall, the maps with basin-level trends show similar results among the five different solutions through most of the 120 river basins. A notable difference is seen over the Yukon River basin (basin n. 028), located between the Yukon Territory in Canada and Alaska in the United States, at which HHU-IPM (Fig. 6d)